A comparison of pronunciation modeling approaches for HMM-TTS
نویسندگان
چکیده
Hidden Markov model-based text-to-speech (HMM-TTS) systems are often trained on manual voice corpus phonetic transcriptions, despite the fact that because these manual pronunciations cannot be predicted with complete accuracy at synthesis time, the result is training/synthesis mismatch. In this paper, an alternate approach is proposed in which a set of manually written post-lexical effects (PLE) rules modeling a range of continuous speech effects are applied to canonical lexicon pronunciations, and the resulting matched PLE phone sequences are used both in the voice corpus markup and at synthesis time. For a US English system, a subjective evaluation showed that a system trained on matched PLE markup and a system trained on manual phone markup were equally preferred, suggesting that it may be possible to replace manual pronunciations with matched PLE pronunciations, dramatically decreasing the time and cost required to produce an HMM-TTS voice.
منابع مشابه
Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS
This paper develops a bilingual Thai-English TTS system from two monolingual HMM-based TTS systems. An English Nagoya HMM-based TTS system (HTS) provides correct pronunciations of English words but the voice is different from the voice in a Thai HTS system. We apply a CSMAPLR adaptation technique to make the English voice sounds more similar to the Thai voice. To overcome a phone mapping proble...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کاملSyllable HMM based Mandarin TTS and comparison with concatenative TTS
This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...
متن کاملPronunciation lexicon adaptation for TTS voice building
This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker’s variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of our TTS voice d...
متن کاملEFL Pronunciation Teaching: A Theoretical Review
This study aims to represent the developing status of pronunciation teaching and presents the current perspectives on pronunciation learning and teaching, coupled with innovative approaches and techniques/activities. It is argued that pronunciation teaching methodologies have changed over decades since the Reform Movement. The exact status of teaching pronunciation appeared first in the Audio L...
متن کامل